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GEOMETRY OF MAXIMUM LIKELIHOOD ESTIMATION 
IN GAUSSIAN GRAPHICAL MODELS 

By Caroline Uhler 

Institute of Science and Technology Austria 

We study maximum likelihood estimation in Gaussian graphical 
models from a geometric point of view. An algebraic elimination cri- 
terion allows us to find exact lower bounds on the number of observa- 
tions needed to ensure that the maximum likelihood estimator (MLE) 
exists with probability one. This is applied to bipartite graphs, grids 
and colored graphs. We also study the ML degree, and we present the 
first instance of a graph for which the MLE exists with probability 
one, even when the number of observations equals the treewidth. 

1. Introduction. In current statistical applications, we are often faced 
with problems involving a large number of random variables, but only a small 
number of observations (e.g., [15], Chapter 18). This problem arises, for ex- 
ample, when studying genetic networks: We seek a model potentially in- 
volving a vast number of genes, while we are only given gene expression 
data of a few individuals. Gaussian graphical models have frequently been 
used to study gene association networks. The maximum likelihood estima- 
tor (MLE) of the covariance matrix is computed to describe the interaction 
between different genes (e.g., [19, 22]). So the following question is of great 
interest from an applied as well as a theoretical point of view: What is the 
minimum number of observations needed to guarantee the existence of the 
MLE in a Gaussian graphical model? It is well known that the MLE exists 
with probability one if the number of observations is at least as large as the 
number of variables. In this paper we examine the case of fewer observations. 

Gaussian graphical models have been introduced by Dempster [8] under 
the name of covariance selection models. Subsequently, the graphical repre- 
sentation of these models increased in importance. Lauritzen [17] and Whit- 
taker [21] give introductions to graphical models in general and discuss the 
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connection between graph and probability distribution for Gaussian graph- 
ical models. 

Gaussian graphical models are regular exponential families. The statistical 
theory of exponential families, as presented, for example, by Brown [5] or 
Barndorff-Nielsen [2], is a strong tool to establish existence and uniqueness 
of the MLE. The MLE exists and is unique if and only if the sufficient 
statistic lies in the interior of its convex support. We will give a geometric 
description of the convex support of the sufficient statistics and discuss the 
connection to the number of samples. 

This paper is organized as follows. In Section 2, we explain the connection 
between maximum likelihood estimation in Gaussian graphical models and 
positive definite matrix completion problems. In Section 3, we give a geomet- 
ric description of the problem, and we develop an exact algebraic algorithm 
to determine lower bounds on the number of observations needed to ensure 
existence of the MLE with probability one. In Section 4, we discuss the ex- 
istence of the MLE for bipartite graphs. Section 5 deals with small graphs. 
The 3x3 grid motivated this paper and is the original problem posed by 
Steffen Lauritzen during his lecture on the existence of the MLE in Gaussian 
graphical models at the "Durham Symposium on Mathematical Aspects of 
Graphical Models" on July 8, 2008. The 3x3 grid is also the first example 
of a graph for which the MLE exists with probability one even when the 
number of observations equals the treewidth of the underlying graph. We 
conclude this paper with a characterization of Gaussian models on colored 
4-cycles in Section 6. 

2. Positive definite matrix completion. Let G = ([m],E) be an undi- 
rected graph on the vertex set [m] = {1, . . . , m} with edge set E. To simplify 
notation, we assume that E contains all self-loops, that is, E E for all 
i E [m\. Let q denote the maximal clique size of G. A graph G is chordal 
if it contains no chordless cycle of length greater than 3. For a nonchordal 
graph G = ([m],E) one can define a chordal cover G + = ([m],E + ), which is 
a chordal graph satisfying E C E + . We denote its maximal clique size by q + . 
It is useful to introduce the notion of a minimal chordal cover G* = ( [m] ,E*), 
where minimality refers to the maximal clique size in the chordal cover, that 
is, q* =min(<7 + ). The treewidth of a graph t(G) is defined as 

r(G) = q*-l. 

A random vector X taking values in W 71 is said to satisfy the Gaussian 
graphical model with graph G if X follows a multivariate normal distribution 
obeying the undirected pairwise Markov property (e.g., [17, 21]). Assuming 
the mean to be zero, this property is as follows: 

(1) X ~ M(0, £), S positive definite with (S _1 )ij = V(i, j) £ E. 
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The results in this paper are based on the assumption that the mean is 
a known vector. In particular, we study the case where the mean is zero. The 
case where the mean is unknown or partially known is more complex, since 
mean and covariance matrix can generally not be estimated independently. 
Gehrmann and Lauritzen [11] describe symmetry relations on the underlying 
graph which ensure estimability of the mean vector independently from the 
true covariance matrix S. 

We denote by E> m the set of symmetric m x m matrices and by S™ the 
open convex cone of positive definite matrices. For a matrix M £ §> m let Mq 
denote the G -partial matrix consisting of all entries of M corresponding to 
edges in the graph G, that is, 

M G = (M lj \(i,j)eE). 

In particular, all diagonal entries of the partial matrix Mq are specified, be- 
cause we assume that the edge set E contains all self- loops. Equivalently, Mq 
is the projection of M onto the (coordinates indexed by the) edge set of the 
graph G: 

ir G :§ m ^R E , M^M G . 

Let X\ , . . . , X n denote n independent draws from the distribution W(0, S) . 
Then the sample covariance matrix is given by 

s= l -±x lX J. 

n z ^ 

The G-partial sample covariance matrix S G plays an important role when 
studying the existence of the MLE, as seen in the following theorem first 
proven by Dempster [8]. 

Theorem 2.1. In the Gaussian graphical model on G, the MLE of the 
covariance matrix £ exists if and only if the G-partial sample covariance 
matrix S G can be completed to a positive definite matrix. Then the MLE £ 
is the unique completion satisfying (S -1 )^- = for all (i,j) ^ E. 

So checking existence of the MLE in a Gaussian graphical model is a spe- 
cial matrix completion problem with a rank constraint on the partial matrix 
given by the number of observations. Matrix completion problems have been 
extensively studied, and the following result from [14] is very useful in this 
context. 

Theorem 2.2. For a graph G the following statements are equivalent: 

(i) A G-partial matrix Mq 6 M e has a positive definite completion if and 
only if all submatrices corresponding to maximal cliques in Mq are positive 
definite. 

(ii) G is chordal. 
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By combining Theorems 2.1 and 2.2 we get the following result about the 
existence of the MLE in Gaussian graphical models (see also [6]). 

COROLLARY 2.3. If n > q* , the MLE exists with probability 1. If n < q, 
the MLE does not exist. 

Note that chordal graphs have q* = q. Therefore, existence of the MLE 
only depends on the number of observations. For nonchordal graphs, how- 
ever, there is a gap q < n < q* , in which existence of the MLE is not 
well understood. Cycles and wheels (cycles with one additional completely 
connected vertex) are the only nonchordal graphs, which have been stud- 
ied [3, 4, 6]. We will extend the results on cycles and wheels to bipartite 
graphs i^2,m and small grids. 

3. Geometry of maximum likelihood estimation in Gaussian graphical 
models. Every concentration matrix (i.e., inverse of a covariance matrix) 
in a Gaussian graphical model satisfies the undirected pairwise Markov prop- 
erty (1). The set of all concentration matrices in the model is a convex cone 

Kg := {K e S™ | K tj = 0, V(i,j) $ E}. 

Note again that the edge set contains all self-loops, that is, (i, i) 6 E for 
all i £ [m]. By taking the inverse of every matrix in ICg, we get the set of 
all covariance matrices in the model denoted by ICq 1 . This is an algebraic 
variety intersected with the positive definite cone §™ and shown in purple 
in Figure 1. 

Concentration matrices: K Covariance matrices: E 




Fig. 1. Geometry of maximum likelihood estimation in Gaussian graphical models. The 
cone ICg consists of all concentration matrices in the model, and ICq 1 is the corresponding 
set of covariance matrices. The cone of sufficient statistics Cg is defined as the projection 
of Sy onto the edge set of G. It is dual to ICg- Given a sample covariance matrix S, 
fiberg(5) consists of all positive definite completions of the G-partial matrix Sg, and it 
intersects ICq 1 in at most one point, namely the MLE E. 
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In a Gaussian graphical model, the G-partial matrix Sg is a minimal 
sufficient statistic of a sample covariance matrix S (e.g., [17, 21]). So Theo- 
rem 2.1 has the following geometric interpretation also explained in Figure 1: 

Corollary 3.1. The MLEs E and K exist for a given sample covari- 
ance matrix S if and only if 

fibers) :={£€S^ | £ G = S G } 

is nonempty, in which case Sbeig(S) intersects K. G l in exactly one point, 

namely the MLE E. 

So the MLE E has an algebraic description in terms of the sufficient statis- 
tic Sq, that is, E can be represented as a solution to polynomial equations 
in the sufficient statistic Sg- The maximal degree of these polynomials is 
called the ML degree. The ML degree describes the map taking a sample 
covariance matrix S to its maximum likelihood estimate E and is studied in 
more detail in Section 4. 

Applying Corollary 3.1, we can describe the set of all sufficient statistics 
for which the MLE exists. We denote this set by C G . It is given by the 
projection of the positive definite cone §™ onto the edge set of the graph G: 

C G :=7r G (S™ ). 

So Cg is also a convex cone and shown in dark orange in Figure 1. Moreover, 
we proved in [20], Proposition 2.1, that the cone of sufficient statistics C G is 
the convex dual to the cone of concentration matrices /C G . 

Example 3.2. For small-dimensional problems we are able to give a graph- 
ical representation of the cone of sufficient statistics C G . For example, con- 
sider the Gaussian graphical model on the bipartite graph K23 with con- 
centration matrices of the form 
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Note that in order to reduce the number of parameters and be able to 
draw Cg in three-dimensional space, we assume additional equality con- 
straints on the nonzero entries of the concentration matrix, represented by 
the graph coloring above. Such colored Gaussian graphical models, where 
the coloring represents equality constraints on the concentration matrix, are 
called RCON-models and have been introduced in [16]. 

Without loss of generality we can rescale K and assume that all diagonal 
entries are one. The cone of concentration matrices /C G for this model is 
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(a) 



(b) 




(c) 



(d) (e) 



Fig. 2. These pictures illustrate the convex geometry of maximum likelihood estimation 
for Gaussian graphical models. The cone of concentration matrices K-g is shown in (a), 
its algebraic boundary in (b), the dual cone of sufficient statistics in (c) and its algebraic 
boundary in (d) and (e), where (d) is the transparent version of (e). 

shown in Figure 2(a). Its algebraic boundary is described by {det(K) = 0} 
and is shown in Figure 2(b). In this example, the determinant factors into 
two components, a cylinder and an ellipsoid. Dualizing the boundary of Kg 
by the algorithm described in our previous paper ([20], Proposition 2.4) 
results in the hypersurface shown in Figure 2(e). The double cone is dual to 
the cylinder in Figure 2(b). By making the double cone transparent as shown 
in Figure 2(d), we see the enclosed ellipsoid, which is dual to the ellipsoid in 
Figure 2(b). The cone of sufficient statistics Cg is shown in Figure 2(c). The 
MLE exists if and only if the sufficient statistic lies in the interior of this 
convex body. Using the elimination criterion of Theorem 3.3, we can show 
that the MLE exists with probability one already for one observation. 

In this paper, we examine the existence of the MLE for n observations 
in the range q < n < q*, for which the existence of the MLE is not well 
understood. Geometrically, we look at the manifold of rank n matrices on 
the boundary of the cone §y - ^ n g enerai > its projection 



lies in the topological closure of the cone Cg- The MLE exists with probabil- 
ity one for n observations if and only if the projection (2) lies in the interior 
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Based on the geometric interpretation of maximum likelihood estimation 
in Gaussian graphical models, we can derive a sufficient condition for the 
existence of the MLE. The following algebraic elimination criterion can be 
used as an algorithm to establish existence of the MLE with probability one 
for n observation. 

Theorem 3.3 (Elimination criterion). Let Ic,n be the elimination ideal 
obtained from the ideal o/(n + l) x (n+l)-minors of a symmetric mxm ma- 
trix S of unknowns by eliminating all unknowns corresponding to nonedges 
of the graph G. If Ic,n is the zero ideal, then the MLE exists with probability 
one for n observations. 

Proof. The variety corresponding to the ideal of (n + 1) x (n + 1)- 
minors of a symmetric mxm matrix S of unknowns consists of all mxm 
matrices of rank at most n. Eliminating all unknowns corresponding to 
nonedges of the graph G results in the elimination ideal Ic,n (see, e.g., [7]) 
and is geometrically equivalent to a projection onto the cone of sufficient 
statistics Cg- Let V be the variety corresponding to the elimination ideal Ig,u- 
We denote by k its dimension and by (i a A:-dimensional Lebesgue measure. 
The MLE exists with probability one for n observations if 

fi(vndc G ) = o, 

where dCc denotes the boundary of the cone of sufficient statistics Cg- 

If Ic,n is the zero ideal, then the variety V is full-dimensional, and its 
dimension dim(V) = k = dim(Cc). So if we assume that /j,(V fl dCc) > 0, 
then ^{dCc) > 0, which is a contradiction to dim^dCc) < k. □ 

For small examples, the elimination ideal Ig,u can be computed, for ex- 
ample, using Macaulay2 [13], a software system for research in algebraic 
geometry. If Ig,u is not the zero ideal, then an analysis of polynomial in- 
equalities is required. One needs to carefully examine how the components 
of V are located. The argument is subtle because the algebraic boundary 
of Cg may in fact intersect the interior of Cg- So even if the projection V 
is a component of the algebraic boundary of Cg, the MLE might still exist 
with positive probability. We will encounter and describe such an example 
in detail in Section 6. 

4. Bipartite graphs. In this section, we first derive the MLE existence 
results for bipartite graphs K<i,m paralleling the results on cycles proven 
by Buhl [6]. Let the graph K<i m be labeled as shown in Figure 3. A min- 
imal chordal cover is given in Figure 3 (right). As for cycles, for bipartite 
graphs Ki,m we have q = 2 and q* = 3. Therefore only the case of n = 2 
observations is interesting. 

Let X\ and X2 denote two independent samples from the distribution 
A/" m +2(0, E), which obeys the undirected pairwise Markov property on Ki^n- 



<s 



C. UHLER 




We denote by X the (m + 2) x 2 data matrix consisting of the two samples X\ 
and X2 as columns. The rows of X are denoted by xi, . . . ,x m +2- Similarly 
as for cycles in [6], we will describe a criterion on the configuration of data 
vectors x±, . . . ,x m+ 2 ensuring the existence of the MLE. Our proof is es- 
sentially the same argument as used by Buhl [6] for cycles. The following 
characterization of positive definite matrices of size 3x3 proven in [3] will 
be helpful in this context. 

Lemma 4.1. The matrix 

1 cos(q) cos(/3)\ 
cos(q) 1 cos(7) 
cos(/3) cos(7) 1 J 

with 0<a,/3,7<7r is positive definite if and only if 

a</3 + 7, /3<a + 7, 7<a + /3, a + j3 + 7 <2ir. 

Proposition 4.2. The MLE on the graph K 2)Tn exists with probability 
one for n>3 observations, and the MLE does not exist for n < 2 observa- 
tions. For n = 2 observations the MLE exists if and only if the lines generated 
by x\ and x% are direct neighbors [see Figure 4 (left)]. 




Fig. 4. The MLE on K2,m exists in the following situations. Lines and data vectors cor- 
responding to the variables 1 and 2 are drawn in blue. Lines and data vectors corresponding 
to the variables 3, 4, . . . , m + 2 are drawn in red. 



GEOMETRY OF ML ESTIMATION IN GAUSSIAN GRAPHICAL MODELS 9 



Proof. Because the problem of existence of the MLE is a positive defi- 
nite matrix completion problem, we can rescale and rotate the data vectors 
X\, . . . ,x m+2 (i.e. , perform an orthogonal transformation) without changing 
the problem. So without loss of generality we can assume that the vec- 



tors Xi, 



i+ 2 G K have length one, lie in the upper unit half circle and 



x\ = (1,0). We need to prove that the MLE exists if and only if the data 
configuration is as shown in Figure 4 (middle) or (right). 

Let 9ij denote the angle between vector Xi and Xj. Then the i^m-partial 
sample covariance matrix Sk 2 m is °f the form 



/ 



1 



1 



COS (013 ) 
COS (#14) 



COS(0 2 3j 
COS (624) 



\ cos(6 , i >m+2 ) cos(6»2, m+2 ) 



cos(6>i 3 ) 
cos (#23) 



COs(#i4) 
COS ($24) 



1 



1 



cos(6>i im+2 ) \ 
cos(6»2, m+2 ) 



1 



/ 



We put stars (*) at all positions not corresponding to edges in the graph. 
The stars represent the entries of the sample covariance matrix which are 
not part of the sufficient statistics. 

The graph K 2t m can be extended to a chordal graph by adding one edge 
as shown in Figure 3 (right). So by Theorem 2.2, Sk 2 m can be extended 
to a positive definite matrix if and only if the (1,2) entry of SK 2m can be 
completed in such a way that all the submatrices corresponding to maximal 
cliques are positive definite. This is equivalent to the existence of p G R with 
< p < 7T such that 

1 cos(p) cos(#ii) 
cos(p) 1 cos(6>2i) )^0 for all % G {3,4, . . . , m + 2}. 
cos(0ii) cos(6>2i) 1 



By Lemma 4.1 this occurs if and only if 



'li 
hi 



<p< 



for all i, j G {3, 4, . . . , m + 2}, 



2tt - Oxj - e 2j 
which is equivalent to 

(3) 29 ai < 9 U + 9 2i + 9ij + 9 2j <2ir + 29 ai 

for all a G {1, 2}, i, j G {3, 4, . . . , m + 2}. We distinguish two cases. 

Case 1. There is a vector Xj lying between x\ and x 2 , which implies that 
9\ j + 9 2 j = 9\ 2 . If there was a vector Xi, j, which does not lie between x\ 
and x 2 , then 



9\j + 9 2 j + 9\i + 9 2 i — 29\ i, 
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which is a contradiction to (3). Hence all vectors X3, X4, . . . x m+ 2 lie be- 
tween x\ and X2, in which case 

&li + 02i + &lj + &2j = 2012, 

and inequality (3) is satisfied. 

Case 2. The vectors x\ and X2 are direct neighbors, which implies that 
0li + 02i = 012 + 202i for alii G {3, 4, . . . , m + 2}, in which case inequality (3) 
is satisfied. 

This proves that for two observations, the MLE exists if and only if the 
data configuration is as shown in Figure 4 (middle) or (right). □ 

The geometric explanation of what is happening in this example is that 
the projection of the positive definite matrices of rank 2 intersects the in- 
terior and the boundary of the cone of sufficient statistics Cq with posi- 
tive measure. The sufficient statistics originating from data vectors, where 
lines 1 and 2 are neighbors, lie in the interior of Cq- If lines 1 and 2 are not 
neighbors, the corresponding sufficient statistics lie on the boundary of the 
cone Cq, and the MLE does not exist. A similar situation is encountered in 
Example 6.2 and depicted in Figure 8. 

It is worth remarking that if the m + 2 variables are independent, we 
can compute the probability of existence of the MLE by a combinatorial 
argument. In this case, the probability that the MLE exists is given by 

2m! 2 

(m +1)! m + 1 

A different approach to gaining a better understanding of maximum like- 
lihood estimation in Gaussian graphical models is to study the ML degree of 
the underlying graph. The map taking a sample covariance matrix S to its 
maximum likelihood estimate E is an algebraic function, and its degree is the 
ML degree of the model. See [9], Definition 2.1.4. The ML degree represents 
the algebraic complexity of the problem of finding the MLE. This suggests 
that a larger ML degree results in a more difficult MLE existence problem. 
We proved in [20] that the ML degree is one if and only if the underlying 
graph is chordal. It is conjectured in [9], Section 7.4, that the ML degree of 
the cycle grows exponentially in the cycle length. An interesting contrast to 
the cycle conjecture is the following theorem, where we prove that the ML 
degree for bipartite graphs K2 >m grows linearly in the number of variables. 

Theorem 4.3. In a Gaussian graphical model with underlying graph K2, m 
the ML degree is 2m + 1 . 

Proof. Given a generic matrix S G S m+2 , we fix E £ § m+2 with entries 
Eij = Sij for (i,j) G E and unknowns E12 = £21 = V an d = z%j for all 
other (i,j) ^ E. We denote by K = S _1 the corresponding concentration 
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matrix. The ML degree of i^2,m is the number of complex solutions to 

= forall(*,j)g£. 

Let A denote the set consisting of the two distinguished vertices {1,2}, 
and let B = V \ A. In the following we will use the block structure 



y i= ( ^AA ^AB \ 
V ^BA ^BB J ' 



K _( K A a Kab \ 
\K BA K bb ) 



For example, for the graph -^2,5 the corresponding covariance matrix £ and 
concentration matrix K are of the form 
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Note that the block Kbb is a diagonal matrix. Hence the Schur complement 



^BA^AA^ AB 



is also a diagonal matrix. Writing out the off-diagonal entries of this ma- 
trix results in the following expression for the variables z in terms of the 
variable y: 

;(y(SnS2j + SijS2i) — SuSij — S2iS2j)- 



1.1 



Setting the minor Mi 2 of S to zero results in the last equation of the form 

(4) ydet(SeB) + (polynomial in z of degree m — 1) = 0. 

We note that det(S^s) is a polynomial in z of degree m, where the degree 
term is 1. So by multiplying equation (4) with (1 — y 2 ) m , we get a degree 
2m + 1 equation in y and therefore 2m + 1 complex solutions for y. For each 
solution of y we get one solution for the variables z, which proves that the 
ML degree of #2,m is 2m + 1. □ 

Bipartite graphs and cycles are classes of graphs with q = 2 and q* = 3. 
What can we say about such graphs in general regarding the existence of the 
MLE for two observations? A related question has been studied from a purely 
algebraic point of view in [4]. A cycle-completable graph is defined to be 
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m+3 m+3 





Fig. 5. Bipartite graph Kz,m (left) and minimal chordal cover of K^.m (middle). The 
tetrahedron- shaped pillow consisting of all positive semidefinite 3x3 matrices with ones 
on the diagonal is shown in the right figure. 

a graph such that every partial matrix Mq has a positive definite completion 
if and only if Mq is positive definite on all submatrices corresponding to 
maximal cliques in the graph, and all submatrices corresponding to cycles 
in the graph can be completed to a positive definite matrix. It is shown in [4] 
that a graph is cycle-completable if and only if there is a chordal cover with 
no new 4-clique. 

Buhl [6] studied cycles from a more statistical point of view and described 
a criterion on the data vectors for the existence of the MLE for two obser- 
vations. Combining the results of [4] and [6], we get the following result: 

COROLLARY 4.4. Let G be a graph with q = 2 and q* > 3. Then the 
following statements are equivalent: 

(i) For n = 2 observations, the MLE exists if and only if Buhl's cycle 
condition is satisfied on every induced cycle. 

(ii) q* = 3. 

This result solves the problem of existence of the MLE for all graphs with 
q = 2 and q* = 3. Note that Corollary 4.4 is more general than Proposi- 
tion 4.2. The proof, however, is more involved and less constructive. 

For bipartite graphs Kj,^ m the situation is more complicated and we do 
not yet have results similar to Proposition 4.2 and Theorem 4.3. We will 
nevertheless describe some preliminary results. 

Let the graph K^^ m be labeled as shown in Figure 5. A minimal chordal 
cover is given in Figure 5 (middle). Hence, q = 2 and q* = 4. The convex body 
shown in Figure 5 (right) consists of all positive semidefinite 3x3 matrices 
with ones on the diagonal. We call it the tetrahedron-shaped pillow. We will 
prove that the existence of the MLE is equivalent to a nonempty intersection 
of such inflated and shifted tetrahedron-shaped pillows. 

Corollary 4.5. The MLE on the graph K§ m exists if and only if the m 
inflated and shifted tetrahedron- shaped pillows corresponding to the maximal 
cliques in a minimal chordal cover of K^ )Tn shown in Figure 5 (middle) have 
nonempty intersections. 
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Proof. Applying Theorem 2.2 in a similar way as in the proof of The- 
orem 4.2, the partial covariance matrix Sk 3 m can be extended to a positive 
definite matrix if and only if the entries corresponding to the missing edges 
(1,2), (1, 3) and (2, 3) can be completed in such a way that all the submatri- 
ces corresponding to maximal cliques in the minimal chordal cover (Figure 5, 
middle) are positive definite. This is equivalent to the existence of x, y, z G M. 
with —1 < x,y,z < 1 such that 



(5) 



( 1 


Sli 


S2i 


S3 


Sli 
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S2i 
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\S 3 i 
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1 



y for all i G {4,5,. ..,m + 3}, 



where s a i, a G {1, 2,3}, i G {4, 5, . . . , m + 3} are the sufficient statistics cor- 
responding to edges in the bipartite graph K^^. Using Schur complements 
and rescaling, (5) holds if and only if 



(6) 



where 




for all i G {4, 5, . . . , m + 3}, 
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So the MLE exists if and only if the inflated and shifted tetrahedron-shaped 
pillows corresponding to the inequalities in (6) have nonempty intersection. 

□ 



We used the software package Macaulay2 to compute the ML degree 
of A'3 m for m < 4. It is an open problem to find a general formula or a re- 
currence relation for the ML degree of K^ m , where / > 3. 



m 


1 


2 


3 


4 


ML degree 


1 


7 


57 


131 



5. Small graphs. In this section we analyze the 3x3 grid in particular 
and complete the discussion of [20] with the number of observations and the 
corresponding existence probability of the MLE for all graphs with 5 or less 
vertices. 

The 3x3 grid is shown in Figure 6 (left) and has q = 2 and q* = 4. This ex- 
ample represents the starting point of this paper and is the original problem 
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Fig. 6. 3x3 grid % (left) and grid with additional edge W' (right). 

posed by Steffen Lauritzen during his lecture at the "Durham Symposium 
on Mathematical Aspects of Graphical Models" in 2008. As a preparation, 
we first discuss the existence of the MLE for the graph Q on six vertices 
shown in Figure 7. The graph Q also has q = 2 and q* = 4, and is the first 
example for which we can prove that the bound n>q* for the existence of 
the MLE with probability one is not tight and that the MLE can exist with 
probability one, even when the number of observations equals the treewidth. 

Theorem 5.1. The MLE on the graph Q (Figure 7, left) exists with 
probability one for n = 3 observations. 

Proof. We compute the ideal Ig$ by eliminating the variables s\s, S15, 
sw, S24, S26, S34, S35 from the ideal of 4 x 4 minors of the matrix S given 
in (7). This results in the zero ideal, which by Theorem 3.3 completes the 
proof. □ 



Remark 5.2. Theorem 5.1 is equivalent to the following purely alge- 



braic statement. Let 



(7) 



/ 1 S12 S13 S14 S15 516^ 

512 1 S23 S24 S 2 5 S 2 6 

513 S 2 3 1 S34 S35 S36 



1 

514 S24 S34 1 

515 S25 S35 S45 
. Sl6 S26 S36 S46 



S45 S46 

1 / 



1 

•S56 



^0 





Fig. 7. Graph Q (left) and minimal chordal cover of Q (right). 



GEOMETRY OF ML ESTIMATION IN GAUSSIAN GRAPHICAL MODELS 15 



with rank(5) = 3. Then there exist x, y, a, b,c,d,e £l such that 



5': 



/ 


1 


S12 


a 


Su 


b 


C \ 


S12 


1 


S23 


X 


S25 


y 




a 


S23 


1 


d 


e 


S36 






X 


d 


1 




S46 


V 


b 


S25 


e 




1 




c 


y 


«36 


«46 


•S56 


1 / 



eS 6 



So any partial matrix of rank 3 with specified entries at all positions corre- 
sponding to edges in Q can be completed to a positive definite matrix. 

Corollary 5.3. Let "H be the 3 x 3- grid shown in Figure 6. Then the 
MLE on % exists with probability one for n > 3 observations, and the MLE 
does not exist for n < 2 observations. 

Proof. First note that Groebner bases computations are extremely 
memory intensive and the elimination ideal 7-^3 cannot be computed di- 
rectly due to insufficient memory. We solve this problem by gluing together 
smaller graphs. The probability of existence of the MLE for the 3x3 grid H 
is at least as large as the existence probability when the underlying graph 
is %' . The graph %' is a clique sum of two graphs of the form Q, for which 
the MLE existence probability is one for n > 3. □ 

This example shows that although we are not able to compute the elim- 
ination ideal for large graphs directly, the algebraic elimination criterion 
(Theorem 3.3) is still useful also in this situation. We can study small graphs 
with the elimination criterion and glue them together using clique sums to 
build larger graphs. 

For two observations on the 3x3 grid, the cycle conditions are necessary 
but not sufficient for the existence of the MLE (Corollary 4.4). Unlike for 
bipartite graphs K2 >m , the existence of the MLE does not only depend on the 
ordering of the lines corresponding to the data vectors in M?. By simulations 
with the Matlab software cvx [12], one can easily find orderings for which 
the MLE sometimes exists and sometimes does not. Finding a necessary and 
sufficient criterion for the existence of the MLE for two observations remains 
an open problem. 

We now complete the discussion of [20] with the number of observations 
and the corresponding existence probability of the MLE for all graphs with 5 
or less vertices. All nonchordal graphs with 5 or less vertices are shown 
in Table 1. The 4-cycle and 5-cycle in (a) and (b) are covered by Buhl's 
results [6]. The graphs in (c) and (d) are clique sums of two graphs and 
therefore completable if and only if the submatrices corresponding to the 
two subgraphs are completable. Graph (e) is the bipartite graph i^2,3 and 
covered by Theorem 4.2. For the graph in (f) q = 3 and q* = 4. Applying 
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Table 1 

This table shows the number of observations (obs.) and the 
corresponding MLE existence probability for all nonchordal graphs 
on 5 or fewer vertices 



Graph G 1 obs. 2 obs. 3 obs. >4 obs. 





No 


pe(o,i) 


p = l 


p = i 




No 


P6(0,l) 


p = l 


p = i 




No 


PG(0,1) 


p = l 


p = l 




No 


No 


p = l 


P = i 




No 


P6(0,l) 


p = l 


p = l 




No 


No 


p = l 


P =i 




No 


No 


pe(o,i) 


p=i 



the elimination criterion from Theorem 3.3 shows that three observations 
are sufficient for the existence of the MLE. The last example, the 5-wheel in 
graph (g) , is also covered by Buhl's results [6] . 

6. Colored Gaussian graphical models. For some applications, symme- 
tries in the underlying Gaussian graphical model can be assumed. Adding 
symmetry to the conditional independence restrictions of a graphical model 
reduces the number of parameters and in some cases also the number of ob- 
servations needed for the existence of the MLE. The symmetry restrictions 
can be represented by a graph coloring, where edges, or vertices, respectively, 
have the same coloring if the corresponding elements of the concentration 
matrix are equal. Such models are called RCON-models [16]. We discussed 
such a model earlier in Example 3.2. 

We denote the uncolored graph by G and the colored graph by Q. Note 
that in this section the graph G does not contain any self- loops. Let the 
vertices be colored with p different colors and the edges with q different 
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colors: 

V = V 1 UV 2 U---UV p , p<\V\, 

E = E 1 UE 2 U---UE q , q<\E\. 

Then the set of all concentration matrices K.g consists of all positive definite 
matrices satisfying: 

• K a p = for any pair of vertices a, (3 that do not form an edge in G. 

• K aa = Kpp for any pair of vertices a, /3 in a common vertex color class V{. 

• K a/ 3 = K 7 s for any pair of edges (a, /J), (7, 5) in a common edge color 
class Ej. 

This means that also for RCON-models the set Kg is defined by linear 
equations on the concentration matrix K. So the geometry of maximum 
likelihood estimation is the same as that explained in Section 3, and it is 
straightforward to derive the equivalent of Theorem 2.1 for colored Gaussian 
graphical models. 

Theorem 6.1. In a colored Gaussian graphical model on Q the MLE 
of the covariance matrix X exists if and only if there is a positive definite 
matrix S such that 

^aa = ^2 S aa and ^ = ^2 Sctfi 

for all vertex color classes V±,...,V P and all edge color classes E±, . . . , E„. 
Then the MLE S is the unique completion with (S _1 ) QQ , = (S -1 )^ for any 
pair of vertices a, (3 in a common vertex color class Vi, = (S~ 1 ) 7 5 

for any pair of edges (a, /3), (7, 5) in a common edge color class Ej, and 
(S" 1 )^ = for all (a, p) <£ E. 

Example 6.2 (Frets's heads). We revisit the heredity study of head 
dimensions known as Frets's heads reported in [10]. Part of the original data 
are the length and breadth of the heads of 25 pairs of first and second sons. 
This data set was also discussed in [18, 20]. The data supports the following 
colored Gaussian graphical model, where the joint distribution remains the 
same when the two sons are exchanged: 

* * 

l 

/Ai A 3 A 4 
_ A3 Ai A4 
A4 A2 A5 
\A 4 A 5 A 2 , 

In this graph, variable 1 corresponds to the length of the first son's head, 
variable 2 to the length of the second son's head, variable 3 to the breadth 
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Fig. 8. All possible sufficient statistics from one observation are shown on the left. The 
cone of sufficient statistics is shown on the right. 

of the second son's head and variable 4 to the breadth of the first son's head. 
Color classes consisting only of one edge (or vertex) are displayed in black. 

Given a sample covariance matrix S = (sij), the five sufficient statistics 
for this model according to the graph coloring are 

tl = Su + S22, *2 = S33 + «44, *3 = 2.S12, 

*4 = 2(s 2 3 + su), i5 = 2s 34 . 

The algebraic boundary of the cone of sufficient statistics Cg is computed 
in [20] and given by the polynomial 

Hg = (ti - t 3 ) • (ti + t 3 ) • (* 2 ~ t 5 ) • (*2 + h) 

x (4t|t| - 4tit 2 t| + ^ + 8*i* 2 * 3 * 5 - 4*3*4*5 + 4*?*|). 

For two observations the elimination ideal Igp is the zero ideal. Therefore, 
the MLE exists with probability 1 for two or more observations in this model. 
For one observation we get 

Ig,l = (4*1*3 " 4*1*2*1 + *4 + 8*1*2*3*5 " 4*3*4*5 + 4ifi|), 

which corresponds to one of the components of the algebraic boundary of 
the cone of sufficient statistics. In this example, the algebraic boundary of 
the cone of sufficient statistics intersects its interior. This is illustrated in 
Figure 8. In order to get a graphical representation in three-dimensional 
space, we fixed *3 and £5. The variety corresponding to Ig t \ is shown on the 
left. We call this hypersurface the bow tie. The cone of sufficient statistics Cg 
is the convex hull of the bow tie and shown in Figure 8 (right). Its boundary 
consists of four planes corresponding to the components *i — *3, *i +*3, * 2 — *5 
and *2 + *5 shown in blue, and the bows of the bow tie shown in yellow. The 
black curves show where the planes touch the bow tie. Note that the upper 
and lower two triangles of the bow tie lie in the interior of Cg. Only the 
two bows are part of the boundary of Cg . So the MLE exists if the sufficient 
statistic lies on one of the triangles of the bow tie, and it does not exist if the 
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sufficient statistic lies on one of the bows of the bow tie. Consequently, for 
one observation the MLE exists with probability strictly between and 1. 

A different approach is to run simulations, for example, using cvx. We can 
generate vectors of length four and compute the MLE by solving a convex 
optimization problem. If cvx finds a solution, the MLE exists. For this ex- 
ample, however, cvx sometimes does not find a solution, which supports the 
hypothesis that the MLE exists with probability strictly between and 1 for 
one observation. In the following, we give a formal proof by characterizing 
the set of vectors in 1R 4 for which the MLE exists/does not exist. 

For this example, we can exactly characterize not just the sufficient statis- 
tics, but also the observations, for which the MLE exists. In other words, 
we can characterize the observations whose sufficient statistics lie on the tri- 
angles of the bow tie. First, note that by exchanging variables 1 and 2 and 
simultaneously exchanging variables 3 and 4, we get the same model. This 
means that from one observation X\ = (xi, x%, x$, X4) we can generate a sec- 
ond observation X2 = (x2,Xi,Xi, X3). So the resulting data matrix is given by 

X= X2 

\X4 

Applying Buhl's result about two observations on a Gaussian cycle [6], the 
MLE exists if and only if the lines corresponding to the vectors 

»=(*)• »=(")■ »=(2)- 

are not graph consecutive. This is the case if and only if 

(8) |xi| > \x2\ and \x^\ > \x±\ or < [a^l and \xz\ < \x&\. 

Hence, the MLE for one observation exists if and only if the data is incon- 
sistent, meaning that the head of the first (second) son is longer than the 
head of the second (first) son, but the breadth is smaller. In this situation 
the corresponding sufficient statistics lie on the triangles of the bow tie in 
Figure 8. Otherwise the corresponding sufficient statistics lie on the bows of 
the bow tie. If X is diagonal, the MLE exists with probability 0.5, since all 
configurations in (8) have the same probability. 

In our previous paper [20] we found the defining polynomial Hg of the 
cone of sufficient statistics for all colored Gaussian graphical models on the 
4-cycle, which have the property that edges in the same color class connect 
the same vertex color classes. Such models have been studied in [16] and are 
of special interest, because they are invariant under rescaling of variables in 
the same vertex color class. In Tables 2 and 3, we complete the discussion 
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Table 2 

Results on the number of observations and the MLE existence 
probability for all colored Gaussian graphical models with some 
symmetry restrictions (namely, edges in the same color class connect 
the same vertex color classes) on the 4-cycle 



Graph 



K 



1 obs. 



/Ai 


A 2 





A 2 \ 


A 2 


Ai 


A 3 





o 


A3 


A 

Al 


A 2 


\A2 


o 


A> 

A 2 


Ai / 

Al / 


/ Al 


A 3 





A 3 \ 


A 3 


A 2 


A 4 





o 


A i 

A4 


A i 

Al 


A^ 


\A 3 


Q 


A 3 


A 2 / 


/Al 


A 2 





A 2 \ 


A 2 


Ai 


A., 








A 3 


Ai 


A 3 


\ A 2 


n 

u 


X., 
A 3 


A, ; 
Ai/ 


/Ai 


A 3 





A 3 \ 


A 3 


Ai 


A 4 





n 


A , 

A4 


A., 
A2 


A 4 


\A3 


n 
u 


A4 


Al / 


/Ai 


A 3 





A 3 \ 




A 2 


A 4 








A 4 


Ai 


A 4 


u 

\A3 


n 
u 


A , 

A4 


Ai i 

A 2 / 


/ Ai 


A 2 





A 2 \ 


A 2 


Ai 


A, 








A, 


Ai 


A 4 


\A 2 





A 4 


Ai/ 


/Ai 


A 3 





A 3 \ 


A 3 


Ai 


A 4 








A 4 


A 2 


A 5 


\A 3 





A, 


Xi) 


/Ai 


A:, 





A 3 \ 


A 3 


A 2 


A 4 








A 4 


Ai 


A 5 


\A 3 





A, 


A 2 y 


/Ai 


A 4 





A 4 \ 


A 4 


A 2 


A s 








A s 


A 3 


A 6 


\A 4 





A e 


Aa/ 



2 obs. 



>3 obs. 



* + * 



(D + 



(2) 



(3) + 



(4) + 



(5) 



o 

* - ** 



(6) + 



(7) + 



(8) + 



(9) + 



p=l 



P 6(0,1) 



No? 



p=l 



p = l 



P 6(0,1) p = 



p = l 



p=l 



p = 1 p = 1 



p=l 
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Table 2 
( Continued) 



Graph 



K 



1 obs. 



2 obs. >3 obs. 



(10) + 



/Ai A 2 A 3 \ 

A 2 Ai A 3 

A 3 Ai A 4 

\A 3 A 4 Ai/ 



p=l 



p = 1 p = 1 



(11) + 1 [ + 



Ai A 3 A 4 \ 

A3 A2 A4 

A4 Ai A5 

,A 4 A 5 A 2 / 



No? 



p = 1 p=l 



* * 



(12) 



(M 


A 2 





A 5 \ 


A 2 


Ai 


A:; 








A 3 


Ai 


A 4 


\\s 





A 4 





p=l 



(13) 





1= 




a 3 





A 6 \ 








Ai 


A 4 







. 





A 4 


A 2 


As 




Va 6 





As 


a 2 / 



pg(o,i) 



P = i 



P = i 



(14) 



(15) 



(16) 



(17) 



(18) 



/ Ai 


A:; 





A 6 \ 


A 3 


A 2 


A 4 








A 4 


Ai 


A 5 


\A 6 





As 


A 2 / 


/Ai 


A:, 





A 6 \ 


A 3 


Ai 


A 4 








A 4 


Ai 


As 


\A 6 





As 


A 2 / 


/ Ai 


A 4 





A 7 \ 


A 4 


Ai 


As 








As 


A 2 


A 6 


\A 7 





A, 


As/ 


fM 


A 4 





A 7 \ 


A 4 


A 2 


As 








As 


Ai 


A 6 


\A 7 





A 6 


As/ 


fM 


As 





A 8 \ 


As 


A 2 


A 6 








A 6 


A 3 


A 7 


\A 8 





A 7 


A 4 / 



No? 



PG(0,1) 



p=l 



No p= 1 



No? p = 1 



No P 6(0,1) 



p=l 



i=l 



i=l 
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Table 3 

All RCOP-models (introduced in [16]), that is, graphs with an 
additional permutation property on the 4-cycle 



Graph 



K 



1 obs. 



Ax 


A 2 





A 2 \ 


A 2 


Ai 


A 2 








A 2 


Ai 


A 2 


\A 2 





A 2 


XxJ 


/Ai 


A 3 





A 3 \ 


A 3 


A 2 


A 3 








A 3 


Ai 


A 3 


\A 3 





A:, 


A 2 / 


f\i 


A 2 





A 3 \ 


A 2 


Ai 


A 3 








A 3 


Ai 


A 2 


\A 3 





A 2 


w 


/A. 


A 3 





A 4 \ 


A 3 


A 2 


A 4 





■ 


A 4 


Ai 


A 3 


\A 4 





A 3 


A 2 / 


fXi 


A 4 





A 4 \ 


A 4 


A 2 


A 5 








A, 


A 3 


As 


\A 4 





A s 


a 2 ; 




A 3 





A 4 \ 


A 3 


Ai 


A 4 





■ 


A 4 


A 2 


As 


\A 4 





A 5 


A 2 / 



2 obs. 



>3 obs. 



(1) + 



(2) + 



(3) + 



(4) 



* _ ## 

n 



(5) + 



(6) + 



p = l 



i = l 



pe(o,i) p = i 



1 = 1 



1=1 



of [20] with the number of observations and the corresponding existence 
probability of the MLE. 

For every colored 4-cycle, we computed the elimination ideal Ig jn for n = 
1, 2, 3. If it is the zero ideal, we know from Theorem 3.3 that the MLE exists 
with probability one. If Ig jU is nonzero, we run simulations using cvx. If we 
find examples for which the MLE exists and other examples for which the 
MLE does not exist, it indicates that the MLE exists with probability strictly 
between and 1 for n observations. In cases where simulations do not yield 
any counterexamples, we need to prove that the MLE does indeed not exist 
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by carefully analyzing the components corresponding to the ideal Ig, n - This 
is the case for one observation on the graphs (9), (11), (14) and (17). Note 
that the graphical models (9) and (11) are sub-models of (14) and (17). So if 
we prove that the MLE does not exist for one observation on the graphs (9) 
and (11), this follows also for the graphs (14) and (17). 

If the cone Cg for the graphs (9) and (11) is a basic open semialgebraic set 
(see, e.g., [1]), then Cg does not meet its algebraic boundary, and the MLE 
does not exist for one observation. So we end with the following conjecture 
which would answer the question marks in Table 2: 

Conjecture 6.3. The cones Cg corresponding to the graphs (9) and (11) 
are basic open semialgebraic sets. 

7. Conclusion. In this paper, we explained the geometry of maximum 
likelihood estimation in Gaussian graphical models. The geometric picture 
can be translated into an algebraic criterion (Theorem 3.3), which allows us 
to find exact lower bounds on the number of observations needed for the ex- 
istence of the MLE (with probability 1). Theorem 3.3 holds for any Gaussian 
graphical model. However, the practical implementation of Theorem 3.3 is 
based on Groebner bases computations, which are extremely memory inten- 
sive. Theorem 5.1 and Corollary 5.3 show the power but also the limitations 
of computational algebraic geometry. We are, in practice, only able to apply 
the algebraic elimination criterion directly to very small graphs. One way 
of getting results for larger graphs is to find a clique decomposition into 
small subgraphs, which can be handled individually. A different future line 
of research is to use the small examples to understand the existence of the 
MLE asymptotically. If we fix a class of graphs, for example, cycles or grids, 
what can we say about the existence of the MLE as the number of vertices 
tends to infinity? Medium-sized graphs, however, remain untouched by both 
approaches, and finding the minimum number of observations needed for the 
existence of the MLE for such graphs is an interesting open problem. 
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